11 research outputs found

    Efficient checkpoint/verification patterns

    Get PDF
    International audienceErrors have become a critical problem for high performance computing. Checkpointing protocols are often used for error recovery after fail-stop failures. However, silent errors cannot be ignored, and their peculiarity is that such errors are identified only when the corrupted data is activated. To cope with silent errors, we need a verification mechanism to check whether the application state is correct. Checkpoints should be supplemented with verifications to detect silent errors. When a verification is successful, only the last checkpoint needs to be kept in memory because it is known to be correct. In this paper, we analytically determine the best balance of verifications and checkpoints so as to optimize platform throughput. We introduce a balanced algorithm using a pattern with p checkpoints and q verifica-tions, which regularly interleaves both checkpoints and verifications across same-size computational chunks. We show how to compute the waste of an arbitrary pattern, and we prove that the balanced algorithm is optimal when the platform MTBF (Mean Time Between Failures) is large in front of the other parameters (checkpointing, verification and recovery costs). We conduct several simulations to show the gain achieved by this balanced algorithm for well-chosen values of p and q, compared to the base algorithm that always perform a verification just before taking a checkpoint (p = q = 1), and we exhibit gains of up to 19%

    Efficient checkpoint/verification patterns for silent error detection

    Get PDF
    Resilience has become a critical problem for high performance computing. Checkpointing protocols are often used for error recovery after fail-stop failures. However, silent errors cannot be ignored, and their particularities is that such errors are identified only when the corrupted data is activated. To cope with silent errors, we need a verification mechanism to check whether the application state is correct. Checkpoints should be supplemented with verifications to detect silent errors. When a verification is successful, only the last checkpoint needs to be kept in memory because it is known to be correct. In this paper, we analytically determine the best balance of verifications and checkpoints so as to optimize platform throughput. We introduce a balanced algorithm using a pattern with p checkpoints and q verifications, which regularly interleaves both checkpoints and verifications across same-size computational chunks. We show how to compute the waste of an arbitrary pattern, and we prove that the balanced algorithm is optimal when the platform MTBF (Mean Time Between Failures) is large in front of the other parameters (checkpointing, verification and recovery costs). We conduct several simulations to show the gain achieved by this balanced algorithm for well-chosen values of p and q, compared to the base algorithm that always perform a verification just before taking a checkpoint ( p = q = 1), and we exhibit gains of up to 19%.

    Coping with Recall and Precision of Soft Error Detectors

    Get PDF
    Many methods are available to detect silent errors in high-performancecomputing (HPC) applications. Each comes with a given cost, recall (fractionof all errors that are actually detected, i.e., false negatives),and precision (fraction of true errors amongst all detected errors,i.e., false positives).The main contribution of this paperis to characterize the optimal computing pattern for an application:which detector(s) to use, how many detectors of each type touse, together with the length of the work segment that precedes each of them.We first prove that detectors with imperfect precisions offer limited usefulness.Then we focus on detectors with perfect precision, and weconduct a comprehensive complexity analysis of this optimization problem,showing NP-completeness and designing an FPTAS (Fully Polynomial-TimeApproximation Scheme). On the practical side, we provide a greedy algorithm,whose performance is shown to be close to the optimal for a realistic set ofevaluation scenarios. Extensive simulations illustrate the usefulness of detectorswith false negatives, which are available at a lower cost than guaranteed detectors.De nombreuses méthodes sont disponibles pour détecter les erreurs silencieuses dans les applications de Calcul Haute Performance (HPC). Chaque méthode a un coût, un rappel (fraction de toutes les erreurs qui sont effectivement détectées, i.e., faux négatifs), et une précision (fraction des vraies erreurs parmi toutes les erreurs détectées, i.e., faux positifs). La principale contribution de ctravail est de montrer quel(s) détecteur(s) utiliser, et de caractériser le motif de calcul optimale pour une application: combien de détecteurs de chaque type utiliser, ainsi que la longueur du segment de travail qui les précède.Nous prouvons que les détecteurs avec une précision non parfaite sont d'une utilité limitée. Ainsi, nous nous concentrons sur des détecteurs avec une précision parfaite et nous menons une analyse de complexité exhaustive de ce problème d'optimisation, montrant sa NP-complétude et concevant un schéma FPTAS (Fully Polynomial-Time Approximation Scheme). Sur le plan pratique, nous fournissons un algorithme glouton dont la performance est montrée comme étant proche de l'optimal pour un ensemble réaliste de scénarios d'évaluation. De nombreuses simulations démontrent l'utilité de détecteurs avec des résultats faux-négatifs (i.e., des erreurs non détectées), qui sont disponibles à un coût bien moindre que les détecteurs parfaits

    Which Verification for Soft Error Detection?

    Get PDF
    International audienceMany methods are available to detect silent errors in high-performance computing (HPC) applications. Each comes with a given cost and recall (fraction of all errors that are actually detected). The main contribution of this paper is to characterize the optimal computational pattern for an application: which detector(s) to use, how many detectors of each type to use, together with the length of the work segment that precedes each of them. We conduct a comprehensive complexity analysis of this optimization problem, showing NP-completeness and designing an FPTAS (Fully Polynomial-Time Approximation Scheme). On the practical side, we provide a greedy algorithm whose performance is shown to be close to the optimal for a realistic set of evaluation scenarios

    Guidelines for the use and interpretation of assays for monitoring autophagy (3rd edition)

    Get PDF
    In 2008 we published the first set of guidelines for standardizing research in autophagy. Since then, research on this topic has continued to accelerate, and many new scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Accordingly, it is important to update these guidelines for monitoring autophagy in different organisms. Various reviews have described the range of assays that have been used for this purpose. Nevertheless, there continues to be confusion regarding acceptable methods to measure autophagy, especially in multicellular eukaryotes. For example, a key point that needs to be emphasized is that there is a difference between measurements that monitor the numbers or volume of autophagic elements (e.g., autophagosomes or autolysosomes) at any stage of the autophagic process versus those that measure fl ux through the autophagy pathway (i.e., the complete process including the amount and rate of cargo sequestered and degraded). In particular, a block in macroautophagy that results in autophagosome accumulation must be differentiated from stimuli that increase autophagic activity, defi ned as increased autophagy induction coupled with increased delivery to, and degradation within, lysosomes (inmost higher eukaryotes and some protists such as Dictyostelium ) or the vacuole (in plants and fungi). In other words, it is especially important that investigators new to the fi eld understand that the appearance of more autophagosomes does not necessarily equate with more autophagy. In fact, in many cases, autophagosomes accumulate because of a block in trafficking to lysosomes without a concomitant change in autophagosome biogenesis, whereas an increase in autolysosomes may reflect a reduction in degradative activity. It is worth emphasizing here that lysosomal digestion is a stage of autophagy and evaluating its competence is a crucial part of the evaluation of autophagic flux, or complete autophagy. Here, we present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macroautophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes. These guidelines are not meant to be a formulaic set of rules, because the appropriate assays depend in part on the question being asked and the system being used. In addition, we emphasize that no individual assay is guaranteed to be the most appropriate one in every situation, and we strongly recommend the use of multiple assays to monitor autophagy. Along these lines, because of the potential for pleiotropic effects due to blocking autophagy through genetic manipulation it is imperative to delete or knock down more than one autophagy-related gene. In addition, some individual Atg proteins, or groups of proteins, are involved in other cellular pathways so not all Atg proteins can be used as a specific marker for an autophagic process. In these guidelines, we consider these various methods of assessing autophagy and what information can, or cannot, be obtained from them. Finally, by discussing the merits and limits of particular autophagy assays, we hope to encourage technical innovation in the field

    Sur l'impact des vérifications partielles face aux corruptions de données silencieuses

    Get PDF
    Silent errors, or silent data corruptions, constitute a major threat on very large scale platforms. When a silent error strikes, it is not detected immediately but only after some delay, which prevents the use of pure periodic checkpointing approaches devised for fail-stop errors. Instead, checkpointing must be coupled with some verification mechanism to guarantee that corrupted data will never be written into the checkpoint file. Such a guaranteed verification mechanism typically incurs a high cost. In this paper, we investigate the use of partial verification mechanisms in addition to a guaranteed verification. The main objective is to investigate to which extent it is worthwhile to use some light-cost but less precise verification type in the middle of a periodic computing pattern, which ends with a guaranteed verification right before each checkpoint. Introducing partial verifications dramatically complicates the analysis, but we are able to analytically determine the optimal computing pattern (up to first-order approximation), including the optimal length of the pattern, the optimal number of partial verifications, as well as their optimal positions inside the pattern. Simulations based on a wide range of parameters confirm the benefits of partial verifications in certain scenarios, when compared to the baseline algorithm that uses only guaranteed verifications.Les erreurs silencieuses, ou corruptions de données silencieuses, constituent une menace majeure pour les plateformes à très grande échelle. Lorsqu'une erreur frappe, elle n'est pas détectée immédiatement mais seulement après un certain laps de temps, ce qui rend inutilisable l'approche à base de checkpoint périodique pur, recommandée pour les pannes. A la place, il faut coupler les checkpoints à un mécanisme de vérification afin de garantir qu'aucune donnée corrompue ne sera écrite dans le fichier de checkpoint. Un tel mécanisme de vérification garantie est associé à un coût élevé. Dans ce rapport, nous étudions l'utilisation de vérifications partielles en plus de vérifications garanties. L'objectif principal est d'étudier jusqu'à quel point il peut être rentable d'utiliser un mécanisme de vérification à faible coût mais moins précis au milieu d'un motif de calcul périodique, avec une vérification garantie juste avant chaque checkpoint. L'introduction de vérifications partielles complique considérablement l'analyse, mais nous sommes en mesure de calculer analytiquement le motif de calcul optimal (avec une approximation du premier ordre), notamment la longueur optimale du motif, le nombre optimal de vérifications partielles ainsi que leur position optimale respectives à l'intérieur du motif. Des simulations basées sur un large choix de paramètres confirment les avantages des vérifications partielles dans certains scénarios, comparées à un algorithme utilisant seulement des vérifications garanties

    Efficient checkpoint/verification patterns

    No full text

    Quelle vérification pour la détection d'erreurs silencieuses ?

    Get PDF
    Many methods are available to detect silent errors in high-performancecomputing (HPC) applications. Each comes with a given cost and recall (fractionof all errors that are actually detected). The main contribution of this paperis to show which detector(s) to use, and to characterize the optimalcomputational pattern for the application: how many detectors of each type touse, together with the length of the work segment that precedes each of them.We conduct a comprehensive complexity analysis of this optimization problem,showing NP-completeness and designing an FPTAS (Fully Polynomial-TimeApproximation Scheme). On the practical side, we provide a greedy algorithmwhose performance is shown to be close to the optimal for a realistic set ofevaluation scenarios.De nombreuses méthodes sont disponibles pour détecter les erreurs silencieuses dans les applications de Calcul Haute Performance (HPC). Chaque méthode a un coût et un rappel (fraction de toutes les erreurs qui sont effectivement détectées). La principale contribution de cet article est de montrer quel(s) détecteur(s) utiliser, et de caractériser le motif de calcul optimale pour une application: combien de détecteurs de chaque type utiliser, ainsi que la longueur du segment de travail qui les précède. Nous menons une analyse de complexité exhaustive de ce problème d'optimisation, montrant sa NP-complétude et la conception d'une FPTAS (Fully Polynomial-Time Approximation Scheme). Sur le plan pratique, nous fournissons un algorithme glouton dont la performance est montrée comme étant proche de l'optimal pour un ensemble réaliste de scénarios d'évaluation

    Erratum to: Guidelines for the use and interpretation of assays for monitoring autophagy (3rd edition) (Autophagy, 12, 1, 1-222, 10.1080/15548627.2015.1100356

    No full text
    non present
    corecore